Learning from Imbalanced Data in Presence of Noisy and Borderline Examples
نویسندگان
چکیده
In this paper we studied re-sampling methods for learning classifiers from imbalanced data. We carried out a series of experiments on artificial data sets to explore the impact of noisy and borderline examples from the minority class on the classifier performance. Results showed that if data was sufficiently disturbed by these factors, then the focused re-sampling methods – NCR and our SPIDER2 – strongly outperformed the oversampling methods. They were also better for real-life data, where PCA visualizations suggested possible existence of noisy examples and large overlapping ares between classes.
منابع مشابه
Managing Borderline and Noisy Examples in Imbalanced Classification by Combining SMOTE with Ensemble Filtering
Imbalance data constitutes a great difficulty for most algorithms learning classifiers. However, as recent works claim, class imbalance is not a problem in itself and performance degradation is also associated with other factors related to the distribution of the data as the presence of noisy and borderline examples in the areas surrounding class boundaries. This contribution proposes to extend...
متن کاملSMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering
Classification datasets often have an unequal class distribution among their examples. This problem is known as imbalanced classification. The Synthetic Minority Over-sampling Technique (SMOTE) is one of the most well-know data pre-processing methods to cope with it and tobalance thedifferentnumberof examples of eachclass.However, as recentworks claim, class imbalance is not a problem in itself...
متن کاملAn insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics
Training classifiers with datasets which suffer of imbalanced class distributions is an important problem in data mining. This issue occurs when the number of examples representing the class of interest is much lower than the ones of the other classes. Its presence in many real-world applications has brought along a growth of attention from researchers. We shortly review the many issues in mach...
متن کاملImpact of Local Data Characteristics on Learning Rules from Imbalanced Data
In this paper we discus improving rule based classifiers learned from class imbalanced data. Standard learning methods often do not work properly with imbalanced data as they are biased to focus on the majority classes while " disregarding " examples from the minority class. The class imbalance affects various types of classifiers, including the rule-based ones. These difficulties include two g...
متن کاملExtending rule based classifiers for dealing with imbalanced data
Many real world applications involve learning from imbalanced data sets, i.e. data where the minority class of primary importance is under-represented in comparison to majority classes. The high imbalance is an important obstacle for many traditional machine learning algorithms as they are biased towards majority classes. It is desired to improve prediction of interesting, minority class exampl...
متن کامل